Slope Centering: Making Shortcut Weights Eeective
نویسنده
چکیده
Shortcut connections are a popular architectural feature of multi-layer perceptrons. It is generally assumed that by implementing a linear sub-mapping, shortcuts assist the learning process in the remainder of the network. Here we nd that this is not always the case: shortcut weights may also act as distractors that slow down convergence and can lead to inferior solutions. This problem can be addressed with slope centering, a particular form of gradient factor centering 2]. By removing the linear component of the error signal at a hidden node, slope centering eeectively decouples that node from the shortcuts that bypass it. This eliminates the possibility of destructive interference from shortcut weights, and thus ensures that the beneets of shortcut connections are fully realized. 1 Shortcuts Shortcut weights bypass a given hidden node by connecting its inputs directly to the node(s) it projects to. They are a popular architectural feature of multi-layer perceptrons, in particular those with more than one hidden layer. They are generally thought to be beneecial to the learning process by providing a linear sub-network that a) backpropagates error gradients to preceding layers without the blurring and attenuation associated with the passage through a layer of hidden nodes, and b) frees the bypassed hidden node(s) from responsibility for the linear component (now implemented by the shortcuts) of the mapping that it is to learn. Here we take a closer look at the second argument. It is true that with shortcuts added, a nonlinear network generally attains larger capacity and may therefore be able to better approximate a given mapping. What about the dynamics of gradient descent in such a network though | how do shortcuts aaect learning in the bypassed hidden node? Our experiments with single hidden layer networks | where backpropagation through shortcuts does not play a role | suggest that shortcuts actually slow down convergence, and may lead to inferior solutions (see Section 3). This should not come as a surprise | after all, the simultaneous adaptation of additional parameters (the shortcut weights) at non-innnitesimal step sizes
منابع مشابه
Slope Centering : Making Shortcut Weights Effective ∗
Shortcut connections are a popular architectural feature of multi-layer perceptrons. It is generally assumed that by implementing a linear submapping, shortcuts assist the learning process in the remainder of the network. Here we find that this is not always the case: shortcut weights may also act as distractors that slow down convergence and can lead to inferior solutions. This problem can be ...
متن کاملOn Centering Neural Network Weight Updates ?
It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals (Schraudolph and Sejnowski, 1996). Here we generalize this notion to all factors involved in the weight update, leading us to propose centering the slope of hidden unit activatio...
متن کاملIDSIA - 19 - 97 April 19 , 1997 revised August 21 , 1998 Centering Neural Network Gradient Factors ?
It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals [2]. Here we generalize this notion to all factors involved in the network’s gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slop...
متن کاملCentering Neural Network Gradient Factors
It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals 2]. Here we generalize this notion to all factors involved in the network's gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slope...
متن کاملAccelerated Gradient Descent by Factor-Centering Decomposition
Gradient factor centering is a new methodology for decomposing neural networks into biased and centered subnets which are then trained in parallel. The decomposition can be applied to any pattern-dependent factor in the network’s gradient, and is designed such that the subnets are more amenable to optimization by gradient descent than the original network: biased subnets because of their simpli...
متن کامل